Structural characteristics in network control of molecular multiplex networks

Numerous real-world systems can be naturally modeled as multilayer networks, providing an efficient tool to characterize these complex systems. Although recent progress in understanding the controlling of synthetic multiplex networks, how to control real multilayer systems remains poorly understood. Here, we explore the controllability and energy requirement of molecular multiplex networks coupled by transcriptional regulatory network (TRN) and protein-protein interaction (PPI) network from the perspective of network structural characteristics. Our findings reveal that the driver nodes tend to avoid essential or pathogen-related genes. However, imposing external inputs on these essential or pathogen-related genes can remarkably reduce the energy cost, implying their crucial role in network control. Moreover, we find that the minimal driver nodes, as well as the energy required, are associated with disassortative coupling between TRN and PPI networks. Our results provide a comprehensive understanding of the roles of genes in biology and network control across several species.

Biological systems, with large-scale and complicated interactions, can also provide graphs of whole cells and entire organisms. The study of biological systems based on network concepts has recently attracted much attention [41][42][43][44]. With the further goal of controlling systems, control theory has also been applied to biological networks to reveal the underlying mechanisms behind life processes [43][44][45][46][47][48]. Therefore, many network-based approaches have been developed to analyze the characteristics of biological networks, e.g., identifying disease genes [49] and drug-target interactions [50].
The functioning of many systems usually requires coupling between different types of networks [51][52][53][54][55][56][57][58]. Biological networks also function as consequences of the complex interactions between different molecular networks [41,[59][60][61]. For instance, proteins in a protein-protein interaction (PPI) network are translated from genes in a transcriptional regulatory network (TRN) [59]. Recently, Mahajan et al. [59] explored the interactions between the TRN and PPI networks of different species and revealed the impact of multiplex architectures on network robustness. They found that the functionally essential genes and proteins are situated in important parts of the multiplex networks.
Those coupled structures of multilayers not only influence the robustness of networks but also have an effect on other functions and abilities of molecular networks, e.g., controllability and control energy. In addition, our understanding of how those essential genes and proteins impact the network control remains a gap. In this study, based on the principles of control for single-layer biological networks, we examine the associations between the functional characteristics of genes and their roles in network control. We show that imposing external inputs on essential or pathogen-related genes can efficiently reduce the required control energy, even though the minimal driver node set tends to avoid these genes. Moreover, we find that a negative correlation between the TRN and PPI layers can simultaneously decrease the number of driver nodes and the required energy.

Model
Here, a transcriptional regulatory network (TRN) represents the interactions among transcription factors and their target genes. An edge in the TRN encodes the direct interactions between a transcription factor and its target genes. In a protein-protein interaction (PPI) network, each node represents a protein, and an undirected link denotes a physical or binding interaction. The coupling links between layers represent the interactions between the genes in the TRN and the proteins in the PPI, in which the proteins translated from genes can also regulate other genes. Edges in the TRN encode direct interactions between a transcription factor and its target genes, in which the transcription factors are considered as genes in our model. The couplings between the TRN and PPI layers form a one-to-one correspondence, i.e., a gene in the TRN is connected to a corresponding protein in the PPI layer [59]. A schematic diagram of the constructed multiplex networks is shown in Fig 1. The datasets are provided by Mahajan and Dar [59], who collected data from 7 species: H. pylori [62,63], M. tuberculosis [64,65], E. coli [66][67][68], C. elegans [68][69][70], A. thaliana [68,69,71], M. musculus [68,69,72], and H. sapiens [68,69,72]. Since TRN and PPI networks have different genome and proteome coverage levels, only the genes and proteins present in both layers are considered in our analysis.
Formally, the dynamics of each node in the TRN-PPI multiplex network are described by where z(t) = (z 1 (t), z 2 (t), � � �, z N+ N (t)) T denotes the states of the 2N nodes at time t. N represents the sizes of both the TRN and PPI network. f(*) = [f 1 (*), f 2 (*), � � �, f N+ N (*)] T captures the nonlinear dynamics of each node, and v(t) = [v 1 (t), v 2 (t), � � �, v M (t)] T denotes the external inputs imposed on the multiplex network. This model can be used to conceptualize the regulatory interactions between genes and proteins, in which some genes can be translated into proteins. For simplicity, assume that the system is at a fixed point z * , where f(z * , v * , t) = 0, and x(t) = z(t) − z * , u(t) = v(t) − v * . Eq 1 can be linearized as [5] _ xðtÞ ¼ where is the global adjacency matrix of the network and captures the interactions between the 2N nodes. A 11 (A 22 ) describes the connections between the nodes within TRN (PPI), and A 12 (A 21 ) captures the intraconnections between the TRN and PPI network. B � @f @v j z * ;v * represents how the external inputs are imposed on nodes. u(t) is the external input.
Note that if the linear system in Eq 2 is locally controllable along a trajectory in the state space, then the corresponding nonlinear system in Eq 1 is also controllable along the same trajectory [5,73]. In addition, the linear control predictions are consistent with the control of nonlinear dynamics [73]. Hereafter, to apply the controllability and control energy framework, we focus on the linear system in Eq 2.

Minimal driver nodes and control energy
A system is controllable if it can be driven from any initial state to any desired state with finite external inputs within finite time. External signals can be applied to nodes in the network, these nodes controlled by external signals are driver nodes, and the minimal number of driver nodes measures the controllability of a network. This is defined as N D , and n D ¼ N D N . According to the exact controllability framework [14], N D can be calculated as where λ i (i = 1, 2, � � �, N) represent the eigenvalues of the adjacency matrix and μ(λ i ) is the geometric multiplicity. Moreover, the minimal driver node set can be identified in B to satisfy the following equation: where A is the adjacency matrix of the network, λ M refers to the eigenvalue obtained according to the maximum geometric multiplicity μ(λ M ) and I N is the identity matrix. It has been noted that the rank of matrix [A − λ M I N , B] is determined by the number of linearly independent rows. By performing an elementary column transformation on the matrix A − λ M I N , we can obtain these linearly dependent rows. Therefore, the external inputs described by B should be imposed on the rows to eliminate all linear relations and make the matrix [A − λ M I N , B] fully ranked, and the corresponding minimal driver node set can be identified [14]. This method based on exact controllability is applied to arbitrary networks. On the basis of optimal control theory [74], the energy required to control the system is where t 0 is the initial time and t f is the final time. If the initial state x 0 = 0 at t 0 = 0, the minimal control energy is where Wð0; t f Þ ¼ R t f 0 e At BB T e A T t dt is the Gramian matrix. As the control energy decays to a nonzero stationary value as the time t increases, the control energy discussed here is E(t f ! 1), and the Gramian matrix is W(0, t f ! 1). We set the elements as A ii ¼ À ðd þ P N j¼1 A ij Þ, where δ = 0.25 to ensure the stability of the whole system [25]. This self-loop can be considered as one of the components which affect the state of the gene or protein in the regulatory process.

Degree-degree correlation
The associativity between the TRN and PPI layers is measured by Pearson's correlation coefficient ρ [75] r ¼ corðk out ; KÞ ¼ P n i¼0 ðk out ðiÞ À � k out Þ P n i¼0 ðKðiÞ À � K Þ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi P n i¼0 ðk out ðiÞ À � k out Þ 2 q ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where k out is the out-degree of the nodes in the TRN, K is the degree of the nodes in the PPI network and n is the number of nodes in each layer.

Simulated annealing
To finely tune the degree-degree correlation coefficient ρ of the multiplex networks, we adopt the simulated annealing algorithm [59,76], which is used to change the value of the degreedegree correlation coefficient ρ by shuffling gene labels [59].
1. Randomly shuffle the gene labels in the TRN, and keep the protein labels in the PPI network unchanged. Then, calculate the absolute difference between the shuffled and desired degree-degree correlations as D P ¼ jcor P k;K À cor D k;K j, where cor P k;K is the present correlation between the nodes with k out degrees in the TRN and nodes with K degrees in the PPI. cor D k;K is the desired correlation. Save the present gene labeling of the TRN.
2. Selecting M genes in the TRN network randomly (here we set M = 10), and shuffle their labels. Save the current labeling of the TRN as a new sample. Then, define the new difference as D * ¼ jcor * k;K À cor D k;K j, where cor * k;K is the degree-degree correlation of the present network after randomly shuffling 10 gene labels, and cor D k;K is the desired correlation. 3. Calculate the difference Δ = Δ P − Δ * , and accept the new labeling in step 2 with probability: where T = T 0 e −λL is the temperature, L is the number of iterations and λ is the rate parameter. The parameters are set as T 0 = 1, 000 and λ = 0.01 [59].
4. If Δ P is smaller than the defined value (here we set the defined value as 0.01), then stop; otherwise, repeat the process from step 2.

Driver node distributions in TRN-PPI multiplex networks
To reflect the controllability of these TRN-PPI multilayer molecular networks, we first examine the minimal driver nodes N D to achieve full control of the network for each species [14] (see Methods). The characteristics of the networks for each species are shown in Table 1. We find that most networks display lower n D (*0.15) than the n D (*0.3) of the H. pylori multiplex network and the values for some other real networks [14]. This indicates that we need to independently control approximately 15% nodes to fully control them. With its small average degree, H. pylori yields a high n D . This is consistent with previous findings that sparse networks are more difficult to control than dense networks [11]. Then, we examine the driver node distributions between the two layers, implying that these driver nodes do not have any particular preference regarding the two layers for most species (Fig 2(b)-2(h)). Strictly speaking, slightly more driver nodes are contained in the TRN than in the PPI layer for the A. thaliana and M. musculus species.

Roles of essential and pathogen-related genes in network controllability
The essential and pathogen-related genes are critical for the survival and health status of an organism; thus, they convey particular topological characteristics in the corresponding multiplex networks [59]. Hence, we examine whether these essential or pathogen-related genes are associated with a prominent network control role. To achieve this, we adopt the gene categories in Ref. [59], where the essential genes for the human species are collected from the Online GEne Essentiality (OGEE) database [77,78], and five human pathogens are collected from the publicly available database named HPIDB 3.0 [79,80]. Then, we compare the fractions of essential or pathogen-related genes and the non-essential or nonpathogen-related genes are selected as the driver nodes. Fig 3 shows that the proportion of essential or pathogen-related genes selected as driver nodes is significantly lower than that of non-essential or nonpathogen-related genes (p-value<0.001), regardless of the considered species. It indicates that the driver nodes tend to avoid the essential or pathogen-related genes in TRN-PPI networks.

Effect of interlayer coupling on network control
We further explore the impact of interlayer correlation on network controllability. Note that the interactions between the TRN and PPI network have already been established by their biological relationships. The purpose of our analysis is to reveal whether the underlying coupling pattern in reality can be partially explained in a network control manner. Therefore, we finely tune the correlation between the TRN and PPI layers through a simulated annealing algorithm [59,76] (see Methods), finding that more driver nodes are required as the network transitions from disassortative to assortative (see Fig 4). We note that the real interlayer correlations for the four species are mostly positive [59]. We calculate the real interlayer correlations and the actual value of n D and compare the actual value of n D with the corresponding 100 random realizations. The results show that the actual value of n D is close to the 100 random realizations. We also assess how degree correlation determines the energy required for network control. We again finely tune the interlayer correlation by rewiring the links. To eliminate the impact of utilizing different driver node sets, all nodes are independently controlled. We find that the control energy in terms of degree correlation displays a trajectory similar to that observed in n D (in Fig 4), i.e., multiplex networks with disassortative couplings between their layers are easier to control than those with assortative couplings (see Fig 5). The real values of E for the original connections between the layers are also calculated (see Fig 5). Though the values of ρ are the same for both the original networks and the simulated models, the real values of n D and E are not the same as the results obtained by simulation. Since the topological structure of  . sapiens, (b) Herpes, (c) Papillomaviruses, (d) HIV, (e) Yersinia, (f) Dengue. We obtain 100 minimal driver node sets and calculate the fractions of essential or pathogen-related genes among the driver node sets of all essential or pathogen-related genes (left, blue box), as well as the fractions of non-essential or nonpathogen-related genes among the driver node set of all non-essential or nonpathogen-related genes (right, yellow box), respectively. Then we obtain the average value. The red crosses represent the outliers, and the p-value is calculated by the t-test.  the network is changed in every independent realization, the driver nodes and control energy are all changed. The errors are positive and negative; this is allowable.

Essential and pathogen-related genes are critical for the control energy
Finally, we examine whether these essential genes can fill critical roles regarding the energy required for TRN-PPI multiplex network control. Subsequently, we propose three driver node selection strategies with the given N D .
(1) All essential or pathogen-related genes are selected, where N E are selected as driver nodes, and then the remaining N D − N E driver nodes are selected randomly.
(2) N D driver nodes are selected based on the in-degrees of all nodes in the TRN layer in descending order. (3) All N D driver nodes are selected at random. Interestingly, we find that the control energies of these three strategies exhibit quite consistent implications: controlling those essential or pathogen-related genes yields the lowest energy required (see Fig 6).
To further explore the results, we find that the essential or pathogen-related genes are located in critical positions in the network structure. The control energy flows through the network by the control chain, which starts from a driver node and ends at a non-driver node along the shortest path between them [27]. The energy for controlling the non-driver nodes increases exponentially with the distance between the non-driver nodes and driver nodes. Thus, a shorter control chain leads to lower control energy.
Furthermore, the betweenness centrality (BC) is an index used to evaluate the importance of a nodal structure; its value is the fraction of shortest paths in the network going through the given node [27]: BC i ¼ P i6 ¼s6 ¼t ðn i st Þ=g st . g st is the number of all shortest paths from node s to node t, and n i st is the number of shortest paths through node i among the g st shortest paths from node s to node t. A node with a larger BC value has more paths between the node pairs going through it. Then, we find that the essential/pathogen-related genes are the nodes with larger BC values. The results in Table 2 show that the BC values of the essential/pathogenrelated genes B E are larger than those of the non-essential genes B NE for most species. This indicates that the essential genes are the nodes that are more likely to be located in the middle positions of the paths between the driver nodes and non-driver nodes. Therefore, selecting these essential nodes as driver nodes reduces the length of the control chains, and the required energy becomes lower.

Conclusion
Interactions between networks are ubiquitous in molecular networks. Here, we focus on multiplex networks consisting of transcriptional regulatory network (TRN) and protein-protein interaction (PPI) networks to explore their minimal driver nodes and the energy required for achieving full control. The results indicate that the driver nodes have no obvious preference for one layer over the other, and the driver nodes are more likely to avoid the essential or pathogen-related genes than other genes. In addition, the TRN-PPI networks with positive interlayer degree-degree correlations need more driver nodes and more energy to achieve full control. By comparing different driver node selection strategies, we find that the TRN-PPI networks require the lowest energy to reach the desired state by driving essential or pathogenrelated genes. Our work bridges the gap between the structural characteristics of molecular multiplex networks and network control. It will be helpful for understanding the essential genes' functions in biology and network control.